Covid-19 India : Data Analysis

Coronavirus disease 2019 (COVID‑19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, Hubei, China, and has resulted in an ongoing pandemic. As of 16 September 2020, more than 29.6 million cases have been reported across 188 countries and territories with more than 936,000 deaths; more than 20.1 million people have recovered.

The COVID-19 pandemic in India is part of the worldwide pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first case of COVID-19 in India, which originated from China, was reported on 30 January 2020. India currently has the largest number of confirmed cases in Asia, and has the second-highest number of confirmed cases in the world after the United States with more than 10.3 million reported cases of COVID-19 infection and more than 150,000 deaths as of January 06, 2021.The per day cases peaked mid-September in India with over 90,000 cases reported per day and have since come down to below 40,000 in December.

In July 2020, India's Ministry of Information and Broadcasting claimed the country's case fatality rate was among the lowest in the world at 2.41% and "steadily declining". By mid-May 2020, six cities accounted for around half of all reported cases in the country – Mumbai, Delhi, Ahmedabad, Chennai, Pune and Kolkata. The last region to report its first case was Lakshwadeep, on 19 January 2021, nearly a year after the first reported case in India. On 10 June, India's recoveries exceeded active cases for the first time. Infection rates started to drop significantly in September, and the number of daily new cases and active cases started to decline rapidly. A Government panel on COVID-19 announced in October that the pandemic had peaked in India, and may come under control by February 2021. India has over 30 anti-COVID vaccines in various stages of development and a national vaccination drive was started on 16 January 2021.

Importing required libraries

Loading the Dataset

Checking basic information about the dataset

In above data Summary we can see that there are total 14654 rows and 9 columns in the dataset. The summary also include the list of columns with their Datatypes( Date column has dtype "object" which needs to be converted into "DateTime" format for the analysis) and the number of non-null values in each column. we also have the value of rangeindex provided for the index axis.

Lets have a look on missing values

According to above visualisation there is no missing values present in the dataset but the dtype of columns "confirmedIndianNational" and "confirmedForeignNational" is not is the proper format so there is a chances of getting misinformation about the data but as I am not going to use these two for the analysis so will gonna drop this.

Data Pre-processing

In our data summary table we have seen that the datatype of "Date" column is of object type. so we need to convert into a proper format i.e Datetime dataype.

Now lets take a look at distinct states in the dataset for the further analysis

There are some rows where the 'States' is not defined properly. So, we are not considering those states for our analysis and thus removing them. Also there is some reduntant state so will drop them.

Let's check the total number of confirmed cases, deaths and Recovered cases in india

State Report with Recovery Rate and Death Rate

India's Top 15 States with Highest Number of Cases

India's Top 15 States with Recovery per 100 Cases

Indias's Top 15 States with Deaths per 100 Cases

Map Visualisation

Total Number of Confirmed Cases in Each State

Total Number of Active Cases in Each State

Total Number of Recovered Cases in Each State

Total Number of Deaths in Each State

Fatality-Ratio

Fatality-Ratio in different States

Fatality-Ratio Over Time

Cure-Ratio in Different States

Cure-Ratio Over time

Time Series Analysis

Covid-19 Rise of Confirmed Cases Datewise

Covid-19 Number of Recovered Cases in India Datewise

Covid-19 Number of Deaths in India Datewise

Let's Have a look on Testing Details

Top 15 States with Total Sample Collections

Top 15 States with Postivie Cases

COVID Tests Done vs Positive Cases for Every 1000 tests

Now Let's have a look on vaccination dataset

Loading the dataset

Let's look at the basic information about the data.

In above data Summary we can see that there are total 4255 rows and 18 columns in the dataset. The summary also include the list of columns with their Datatypes( Date column has dtype "object" which needs to be converted into "DateTime" format for the analysis) and the number of non-null values in each column. we also have the value of rangeindex provided for the index axis.

Checking Missing Values

So we can clearly see that there are quite a lot of missing values present in the entire dataset and specially the few columns like(AEFI and age columns) contains almost 50% of the missing values. So before diving into EDA we need to deal with these missing values and also drop some columns which we don't need for further analysis.

Data Pre-processing

Male and Female Vaccination Ratio

Covaxin and Covishield Vaccination

Doses administered vs People Vaccinated

Total Number of Vaccinations in Each state

Total Number of Total Doses Administered in Each state

Total Session Conducted In each State

Total Covaxin Administered in each State

Total CoviShield Administered in each State

Total Doses Administered in Indian State Datewise

Please Note: This Notebook is inspired from Github,other Kaggle kernels and StackOverflow queries.Due respect and credit to all the kagglers.